Dis-S2V: Discourse Informed Sen2Vec
نویسندگان
چکیده
Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has been shown to outperform the traditional bag-of-words representation. However, most of these learning methods consider only the content of a sentence and disregard the relations among sentences in a discourse by and large. In this paper, we propose a series of novel models for learning latent representations of sentences (Sen2Vec) that consider the content of a sentence as well as inter-sentence relations. We first represent the inter-sentence relations with a language network and then use the network to induce contextual information into the content-based Sen2Vec models. Two different approaches are introduced to exploit the information in the network. Our first approach retrofits (already trained) Sen2Vec vectors with respect to the network in two different ways: (i) using the adjacency relations of a node, and (ii) using a stochastic sampling method which is more flexible in sampling neighbors of a node. The second approach uses a regularizer to encode the information in the network into the existing Sen2Vec model. Experimental results show that our proposed models outperform existing methods in three fundamental information system tasks demonstrating the effectiveness of our approach. The models leverage the computational power of multi-core CPUs to achieve fine-grained computational efficiency. We make our code publicly available upon acceptance.
منابع مشابه
Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec
We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an...
متن کاملBenchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset
In this paper, we explore the real-world Still-to-Video (S2V) face recognition scenario, where only very few (single, in many cases) still images per person are enrolled into the gallery while it is usually possible to capture one or multiple video clips as probe. Typical application of S2V is mug-shot based watch list screening. Generally, in this scenario, the still image(s) were collected un...
متن کاملSerro 2 Virus Highlights the Fundamental Genomic and Biological Features of a Natural Vaccinia Virus Infecting Humans
Vaccinia virus (VACV) has been implicated in infections of dairy cattle and humans, and outbreaks have substantially impacted local economies and public health in Brazil. During a 2005 outbreak, a VACV strain designated Serro 2 virus (S2V) was collected from a 30-year old male milker. Our aim was to phenotypically and genetically characterize this VACV Brazilian isolate. S2V produced small roun...
متن کاملAn Analysis of Iranian EFL Learners’ Dis-preferred Responses in Interactional Discourse
The present study, on the one hand, attempted to investigate the strategies applied in dispreferred responses by Iranian university students of English and the extent to which pragmatic transfer could occur. On the other hand, the study aimed to probe into the association between dispreferred organization and turn-shape. To this end, 31 relevant naturally occurring conversations, totaling 120 ...
متن کاملEvaluating Hierarchical Discourse Segmentation
Hierarchical discourse segmentation is a useful technology, but it is difficult to evaluate. I propose an error measure based on the word error rate of Beeferman et al. (1999). I then show that this new measure not only reliably distinguishes baseline segmentations from lexically-informed hierarchical segmentations and more informed segmentations from less informed segmentations, but it also of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1610.08078 شماره
صفحات -
تاریخ انتشار 2016